15 research outputs found

    The CMS DBS Query Language

    Get PDF
    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture

    Job Life Cycle Management Libraries for CMS Workflow Management Projects

    Get PDF
    Scientific analysis and simulation requires the processing and generation of millions of data samples. These processing and generation tasks are often comprised of multiple smaller tasks divided over multiple (computing) sites. This paper discusses the Compact Muon Solenoid (CMS) workflow infrastructure, and specifically the Python based workflow library which is used for so called task lifecycle management. The CMS workflow infrastructure consists of three layers: high level specification of the various tasks based on input/output datasets, life cycle management of task instances derived from the high level specification and execution management. The workflow library is the result of a convergence of three CMS subprojects that respectively deal with scientific analysis, simulation and real time data aggregation from the experiment

    The CMS Integration Grid Testbed

    Get PDF
    The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the University of Florida at Gainesville. The IGT runs jobs using the Globus Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is managed using VO management scripts from the European Data Grid (EDG). Gridwide monitoring is accomplished using local tools such as Ganglia interfaced into the Globus Metadata Directory Service (MDS) and the agent based Mona Lisa. Domain specific software is packaged and installed using the Distrib ution After Release (DAR) tool of CMS, while middleware under the auspices of the Virtual Data Toolkit (VDT) is distributed using Pacman. During a continuo us two month span in Fall of 2002, over 1 million official CMS GEANT based Monte Carlo events were generated and returned to CERN for analysis while being demonstrated at SC2002. In this paper, we describe the process that led to one of the world's first continuously available, functioning grids.Comment: CHEP 2003 MOCT01

    Software packaging with DAR

    No full text
    One of the important tasks in distributed computing is to deliver software applications to the computing resources. Distribution after Release (DAR) tool, is being used to package software applications for the world-wide event production by the CMS Collaboration. This presentation will focus on the concept of packaging applications based on the runtime environment. We discuss solutions for more effective software distribution based on two years experience with DAR. Finally, we will give an overview of the application distribution process and the interfaces to the CMS production tools

    Contextual Constraint Modeling in Grid Application Workflows

    No full text
    This paper introduces a new mechanism for specifying constraints in distributed workflows. By introducing constraints in a contextual form, it is shown how different people and groups within collaborative communities can cooperatively constrain workflows. A comparison with existing state-of-the-art workflow systems is made. These ideas are explored in practice with an illustrative example from High Energy Physics.

    The CMS Dataset Bookkeeping Service

    No full text
    Abstract. The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems
    corecore